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Detailed protocol 


Dear Zhen-Hao Luo, 
thanks for your question. 


If you use the maximum likelihood (ML) version of ALE (ALEml, ALEmI_undated), then the parameters will be optimised by ML during the analysis, integrating 
over the different ways in which the sample of gene trees can be reconciled with the species tree (this is the version we used in the paper). 


There is also an implementation that samples the parameter values by MCMC (ALEmcmc, ALEmcmc_undated), although these versions of the software are 
not actively developed at the moment. 


Instructions on running each step of the analysis can be found in the ALE documentation on Github (https://github.com/ssolo/ALE). Briefly, if you have a file 
myGeneFamily.ufboot (containing a bootstrap sample of gene trees generated with iqtree -wbtl, or an MCMC sample of trees from your favourite Bayesian 
phylogenetics software), you would first run 


ALEobserve myGeneFamily.ufboot 
resulting in myGeneFamily.ufboot.ale. Then you would run (for the undated model): 
ALEml_undated myRootedSpeciesTree.tre myGeneFamily.ufboot.ale fraction_missing=myFractionMissing.txt 


To fit the model by ML. The output file myRootedSpeciesTree.tre_myGeneFamily.ufboot.ale.uml_rec will contain (among other output) ML estimates of the D, T 
and L rates and a sample of reconciled gene trees. 


There are several command line options that may be of interest. In our paper, we used the fraction_missing option to specify a file containing estimates of the 
missing fraction of each genome; such estimates can be obtained using e.g. CheckM or BUSCO (note: the fraction missing is 1-completeness); this helps to 
correct the rates for the missing data. 


Best, 


Gergely Szollosi and Tom Williams 
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